We will use some some network packages (mainly to illustrate)

as well as some packages for spatial analysis, and visualization

For technical reasons, we need additional packages

In the applications, we will use gurobi. It is a commercial optimization solver for linear programming, quadratic programming (among many other), that is available for free in academia. Installation can be a little bit complicated (see the vignette online),

We will consider three applications in this guide

  • the toy example just to explain the four components in our approach : the nodes and the edges (a road network for instance), the location of sellers and buyers. The first part is to match locations of sellers and buyers with nodes. Then we solve the linear programming problem.
  • the metro example to illustrate the use of a real network (here the metro in Paris), with a single buyer and several sellers (hospitals here)
  • the maternity example to illustrate on read data, with all maternities in France (sellers) and all pregnant women (buyers).

1 A Toy Example

1.1 Networks and Shapefiles

Consider a shapefile of roads, or train lines : each road is a collection of segments, or connected nodes. Our shapefile is, so fact, only in a dataframe dd

Here are the (given) knots and the four roads

Let us create a shapefile from those segments

Show code for the extend_shp() function

Intersections of roads are now added, as knots of the network

We can also visualize all knots of the network

1.3 Minimal Distance (Euclidean)

Closest distance Demand-Supply

FALSE     X   Y   Type
FALSE 4 6.5 5.5 Supply

1.4 Minimal Distance on a Network

1.5 Using igraph

One can use library(igraph) (see http://igraph.org) to find the closest supply location

FALSE The graph is connex : TRUE

If the graph is not connex, we need to keep the largest strongly connected subgraph

We need to keep the largest strongly connected subgraph

FALSE          [,1]     [,2]     [,3]
FALSE [1,] 542.9799 253.2018 772.7032
FALSE [2,] 425.6273 519.5013 246.6023

1.6 Using gurobi

One can use library(gurobi) (see http://gurobi.com/) to solve a linear programing problem. Very generally, the gurobi interface enables to solve problems of the form \(\mathbf{x}^{\text{T}}Q\mathbf{x}+c^{\text{T}}\mathbf{x}+\alpha\) subject to bound constraints \(\mathbf{\ell}\leq\mathbf{x}\leq \text{u}\), some quadratic constraints \(\mathbf{x}^{\text{T}}Qc\mathbf{x}+q^{\text{T}}\mathbf{x}\leq\beta\), some integrality constraints, some ordered set constraints, etc.

Here,the syntax will be the following

  • a sparse matrix \(\nabla\) - Nabla - of size the number of edges (raws) times the number of nodes (columns). Each raw contains zeros, except for the starting node (+1) and the arrival node (-1). Note that \(|\nabla^{\text{T}}|\) is the incidence matrix of the network,
  • a cost vector \(\Phi\) - Phi - of size the number of edges. Here, we use a linear cost function, so \(\Phi\) is the vector of length (in km) of edges,
  • an in/out flow vector \(n\) - n - of size the number of nodes
  • an objective called modelsense : here we want to maximize the flow.

result = gurobi ( list(A=t(Nabla),obj=Phi,modelsense='max',rhs=n,sense='=',start=matrix(0,nbArcs,1)), params=NULL)

The outputs of the gurobi call will be

  • the flows on each edge \(\pi\), pi = result$x
  • the prices for each node \(p\) (in the same unit as matrix \(\Phi\))
FALSE [1] 0
FALSE Optimize a model with 13 rows, 26 columns and 52 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [6e+01, 6e+02]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [2e+00, 3e+00]
FALSE Presolve removed 10 rows and 20 columns
FALSE Presolve time: 0.26s
FALSE Presolved: 3 rows, 6 columns, 12 nonzeros
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE        0   -1.6954446e+03   4.000000e+00   0.000000e+00      0s
FALSE        2   -1.9682155e+03   0.000000e+00   0.000000e+00      1s
FALSE 
FALSE Solved in 2 iterations and 0.66 seconds
FALSE Optimal objective -1.968215514e+03
FALSE $status
FALSE [1] "OPTIMAL"
FALSE 
FALSE $runtime
FALSE [1] 0.6583691
FALSE 
FALSE $itercount
FALSE [1] 2
FALSE 
FALSE $baritercount
FALSE [1] 0
FALSE 
FALSE $nodecount
FALSE [1] 0
FALSE 
FALSE $objval
FALSE [1] -1968.216
FALSE 
FALSE $x
FALSE  [1] 3 1 0 1 0 0 3 0 0 0 2 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0
FALSE 
FALSE $slack
FALSE  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0
FALSE 
FALSE $rc
FALSE  [1]    0.0000    0.0000 -383.6504    0.0000 -383.6521    0.0000    0.0000
FALSE  [8]    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 -246.6023
FALSE [15] -246.8586 -912.8559 -161.8896    0.0000 -750.9680 -506.4036 -295.1692
FALSE [22] -703.7146 -246.8586 -246.6023 -117.6091 -195.9041
FALSE 
FALSE $pi
FALSE  [1]  246.7304715  123.4293099    0.0000000 -264.6027722  -80.9448213
FALSE  [6]  110.8812242  364.0830417  -36.7033854 -388.5606944  246.8586198
FALSE [11]    0.1281483  -22.1402962 -178.8968596
FALSE 
FALSE $vbasis
FALSE  [1]  0  0 -1  0 -1  0  0  0  0  0  0  0  0 -1 -1 -1 -1  0 -1 -1 -1 -1 -1
FALSE [24] -1 -1 -1
FALSE 
FALSE $cbasis
FALSE  [1] -1 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
FALSE 
FALSE $objbound
FALSE [1] -1968.216

The legend for those prices is here

FALSE     X   Y   Type       Xk       Yk NO        price
FALSE 1 5.5 4.0 Demand 5.000000 4.000000  7  364.0830417
FALSE 2 3.5 9.0 Demand 3.000000 9.000000  1  246.7304715
FALSE 3 6.5 8.0 Supply 6.000000 8.000000 13 -178.8968596
FALSE 4 6.5 5.5 Supply 6.263158 5.894737  6  110.8812242
FALSE 5 4.5 9.0 Supply 5.000000 9.000000 11    0.1281483

2 Subway in Paris

For this application, we will use network data of Paris’ Metro with a dataset of edges/arcs and a dataset with locations of nodes,

FALSE [1] 868   3
FALSE [1] 366   6

The following code can be used to visualize our network, with background information obtained from OpenStreetMap, as described in caRtosm’s github page.

Show code to import Paris background information

The following function use use to create a background to visualize the metro network

Consider for instance locations of some hospitals in Paris (one could actually use the geocode function of library(ggmap), from the adresses of hospitals, but here, we have locations, whith longitudes - lon - and latitudes - lat)

and consider someone sick, located Place Denfert-Rochereau

We can plot them on the map

Let us move all those locations to the closest subsway station (the nodes of our network)

Show code

2.1 Using igraph

Using library(igraph), it is possible to compute distances, for instance to get to the first hospital in our list,

or the third one

Distances are respectively

FALSE          [,1]
FALSE [1,] 2872.404
FALSE          [,1]
FALSE [1,] 6029.571

On a graph, we get

Here is the closest hopital (here the th one), for our sick agent

FALSE [1] 5

2.2 Using gurobi

As discussed in the previous example, it is necessary to reshape the data to fit with inputs of the gurobi function. Very generally, the gurobi interface enables to solve problems of the form \(\mathbf{x}^{\text{T}}Q\mathbf{x}+c^{\text{T}}\mathbf{x}+\alpha\) subject to bound constraints \(\mathbf{\ell}\leq\mathbf{x}\leq \text{u}\), some quadratic constraints \(\mathbf{x}^{\text{T}}Qc\mathbf{x}+q^{\text{T}}\mathbf{x}\leq\beta\), some integrality constraints, some ordered set constraints, etc. We have here

  • a sparse matrix \(\nabla\) - Nabla - of size the number of edges (raws) times the number of nodes (columns). Each raw contains zeros, except for the starting node (+1) and the arrival node (-1). Note that \(|\nabla^{\text{T}}|\) is the incidence matrix of the network,
  • a cost vector \(\Phi\) - Phi - of size the number of edges. Here, we use a linear cost function, so \(\Phi\) is the vector of length (in km) of edges,
  • an in/out flow vector \(n\) - n - of size the number of nodes
  • an objective called modelsense : here we want to maximize the flow.

result = gurobi ( list(A=t(Nabla),obj=Phi,modelsense='max',rhs=n,sense='=',start=matrix(0,nbArcs,1)), params=NULL)

The outputs of the gurobi call will be

  • the flows on each edge \(\pi\), pi = result$x
  • the prices for each node \(p\) (in the same unit as matrix \(\Phi\))
FALSE Optimize a model with 366 rows, 868 columns and 1736 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [6e+00, 6e+03]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [1e+00, 1e+00]
FALSE Presolve removed 300 rows and 600 columns
FALSE Presolve time: 0.00s
FALSE Presolved: 66 rows, 268 columns, 536 nonzeros
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE        0   -0.0000000e+00   2.000000e+00   0.000000e+00      0s
FALSE       31   -6.0295714e+03   0.000000e+00   0.000000e+00      0s
FALSE 
FALSE Solved in 31 iterations and 0.00 seconds
FALSE Optimal objective -6.029571414e+03
FALSE [1] "OPTIMAL"
FALSE [1] 6029.571
FALSE [1] 231 276 763 852 854 856 859 862
FALSE [1]  78 359 360 361 362 316 363  93  37

2.3 Voronoi sets

To conclude, notice that it is also possible to create some sort of Voronoi diagrams. The classical one is based on euclidean distances,

Now, if we use subway-based distances

Show code

2.4 Min-Cost Flow

Assume there is one sick person in each metro station. Suppose also that hospitals admit the same number of sick people.

FALSE  [1] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 44.75 -1.00 -1.00 -1.00
FALSE [12] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [23] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [34] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [45] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [56] -1.00 -1.00 44.75 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [67] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [78] -1.00 -1.00 -1.00 -1.00 44.75 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [89] -1.00 -1.00 -1.00
FALSE [1] 0

Then we use a min-cost flow algorithm

FALSE Optimize a model with 366 rows, 868 columns and 1736 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [6e+00, 6e+03]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [1e+00, 4e+01]
FALSE Presolve removed 147 rows and 294 columns
FALSE Presolve time: 0.00s
FALSE Presolved: 219 rows, 574 columns, 1148 nonzeros
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE        0   -1.0169517e+06   7.030000e+02   0.000000e+00      0s
FALSE      317   -1.7186206e+06   0.000000e+00   0.000000e+00      0s
FALSE 
FALSE Solved in 317 iterations and 0.01 seconds
FALSE Optimal objective -1.718620640e+06

The black points below are the hospitals… colors here are related to the price, from (dark) red to (dark) blue

The legend for those prices is here

Show code

3 Natality

In this section, we plan to match pregnent women (demand) and maternities (supply), in France.

3.1 Preliminary work on the datasets

  • Shapefile of roads, where roads are the edges of the network, and intersections are the nodes, obtained from site. Two files arcs.csv and nodes.csv were created.
  • Maternities details where obtained from data.gouv (to get the locations of all maternitives) and data.drees.sante.gouv.fr to get yearly statistics. From those files, three files were created: PERINAT_2016r.csv with birth counts, ID_2016r.csv with information about maternities, and etalab_stock_et_20181231.csv with spatial locations of hospitals,
  • Births in France, where obtained from site, and has been transfered to two files naissance_commune.csv and naissance_arrondissement.csv

For visualisation, we will use various shapefiles

  • Shapefiles of communes (cities) where obtained from actualitix.com
  • Shapefiles of departements where obtained from actualitix.com There is one shapefile per departement

3.1.2 Birth dataset

The birth file contains the number of births per city (ZIP code) in France. A first step can be to visualize those data,

Show code

We can get the number of birth per city (commune) even within Paris (per arrondissement)

Show code

It is also possible to zoom in

Show code

3.1.3 Births and Network

Consider the Hauts de Seine (92) departement

For each polygon, we have the number of birth, for a given year. It is then possible to allocate them to knots inside each polygon.

3.1.4 Maternity dataset

Information is downloaded from data.gouv

Show code

We have a collection of 578 maternities,

FALSE 'data.frame': 578 obs. of  10 variables:
FALSE  $ FI       : Factor w/ 664 levels "010000024","010000032",..: 1 2 3 4 6 7 8 9 10 11 ...
FALSE  $ ACCUN    : int  2112 413 566 801 1233 882 1089 599 600 574 ...
FALSE  $ ACCMU    : int  40 5 3 5 18 12 12 7 8 7 ...
FALSE  $ dep      : Factor w/ 101 levels "01","02","03",..: 1 1 1 1 2 2 2 2 2 2 ...
FALSE  $ COMINSEE : Factor w/ 1794 levels "01004","01034",..: 3 2 11 1 32 28 33 23 22 32 ...
FALSE  $ coordx   : num  871098 908660 902284 882724 719368 ...
FALSE  $ coordy   : num  6569398 6521504 6578385 6542588 6973708 ...
FALSE  $ birth    : int  2152 418 569 806 1251 894 1101 606 608 581 ...
FALSE  $ NEAR_FID : int  151065 194305 206357 90300 39307 90743 42038 15097 158438 193937 ...
FALSE  $ ID_RTE500: int  151065 194305 206357 90300 39307 90743 42038 15097 158438 193937 ...

FALSE dpt
FALSE  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 
FALSE  4  6  5  2  2 11  4  3  2  3  4  5 17  8  4  4  7  4  4  5  6  1  4  4  6 
FALSE 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 
FALSE  7  5  9  6  8  2 12  8  7  3  7 10  3  3  4  8  2  7  5  3  4  1  5  6  7 
FALSE 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 
FALSE  3  3  6  3  7  8  3 22  6  4 12  4  7  4  3  8  8 16  1  5  3  5  7 19 11 
FALSE 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 
FALSE 10 11  3  4  4  3  9  7  7  3  4  5  3  1 13 13 14  9 11

3.2 Matching Homes and Maternities

The first step is a transformation of the edge file (some roads are one way, while others are two way - we duplicate the two way road)

Then transformation of the node file (names of nodes)

A transformation of the birth (demand) file (again, the names of variables)

And a transformation of the maternity (supply) file

We do not have homes of pregnent mothers, only their city (code postal). Sometimes, nodes can be on two cities. For technical reasons, it is necessary to keep nodes only once.

FALSE [1] 980
FALSE [1] 320763      5
FALSE [1] 310742
FALSE [1] 611002      5

Let us check the connexity of the graph

FALSE The graph is connex : FALSE

Unfortunately, the graph is not connected. We will keep the largest strongly connected subgraph,

Show code

We have now 564 maternities.

Now, we need to balance the flow : the number of pregnent mothers should be equal to the number of deliveries in maternies, again for technical reasons. But the first step is to allocate births within a city among all the nodes. We will do that uniformely

For instance, in the Hauts-de-Seine region, we have

Observe that the number of birth per node is not necessarily an integer, but it is not (technically at least) a major issue here.

Let us now balance to insure that the in-coming flow equals the out-coming flow

Number of births in the demand dataset (number of pregnent mothers) is 7.3893310^{5}, while the number of births according to the maternity dataset should be 806341. The difference is here significant (9.1%), but since the goal here is to illustrate the technique, let us no discuss further this issue here.

This will be our balanced in-coming flows

In order to allocate mothers to hospital, define the following shortest path function shortestPath(arcs, nodes, originNode, destinationNode), based on gurobi.

Show code for the shortestPath() function

For instance, consider two random nodes, 41 as the starting one, and 50115 as the destination one. Here is the shortest road,

FALSE Optimize a model with 209105 rows, 605318 columns and 1210636 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [1e-02, 4e+01]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [1e+00, 1e+00]
FALSE 
FALSE Concurrent LP optimizer: dual simplex and barrier
FALSE Showing barrier log only...
FALSE 
FALSE Presolve removed 46809 rows and 93850 columns
FALSE Presolve time: 2.16s
FALSE Presolved: 162296 rows, 511468 columns, 1022930 nonzeros
FALSE 
FALSE Ordering time: 0.00s
FALSE 
FALSE Barrier statistics:
FALSE  AA' NZ     : 2.519e+05
FALSE  Factor NZ  : 2.634e+06 (roughly 300 MBytes of memory)
FALSE  Factor Ops : 9.147e+07 (less than 1 second per iteration)
FALSE  Threads    : 1
FALSE 
FALSE                   Objective                Residual
FALSE Iter       Primal          Dual         Primal    Dual     Compl     Time
FALSE    0  -6.07058023e+05  1.27199149e+00  4.37e+00 2.60e+00  1.51e+01     3s
FALSE    1  -4.55634553e+05 -2.60680691e+01  5.72e-01 4.99e+00  4.08e+00     4s
FALSE    2  -2.14918787e+05 -4.83183251e+01  2.04e-02 2.84e-01  5.55e-01     4s
FALSE    3  -3.53126137e+04 -1.04241845e+02  2.02e-03 5.19e-02  7.74e-02     4s
FALSE    4  -1.42523713e+04 -2.11013242e+02  7.07e-04 2.36e-02  2.94e-02     4s
FALSE    5  -5.25404704e+03 -3.04486913e+02  2.10e-04 1.42e-02  1.02e-02     5s
FALSE    6  -2.94902504e+03 -4.90216105e+02  9.32e-05 1.09e-02  4.98e-03     5s
FALSE    7  -2.00978381e+03 -6.19302115e+02  5.03e-05 5.12e-03  2.78e-03     5s
FALSE    8  -1.50040656e+03 -6.49211353e+02  2.84e-05 3.34e-03  1.70e-03     5s
FALSE    9  -1.31510556e+03 -6.66366203e+02  2.09e-05 2.23e-03  1.29e-03     6s
FALSE   10  -1.19957750e+03 -6.87874872e+02  1.64e-05 1.98e-03  1.02e-03     6s
FALSE   11  -1.09070090e+03 -6.92678533e+02  1.24e-05 3.36e-03  7.89e-04     6s
FALSE   12  -9.27352798e+02 -7.06689238e+02  6.60e-06 1.51e-03  4.36e-04     6s
FALSE   13  -7.48638047e+02 -7.09681943e+02  9.05e-07 8.38e-04  7.69e-05     6s
FALSE   14  -7.14444353e+02 -7.13293883e+02  8.96e-09 8.45e-06  2.25e-06     6s
FALSE   15  -7.13341458e+02 -7.13338658e+02  1.18e-11 1.10e-13  5.48e-09     7s
FALSE   16  -7.13339997e+02 -7.13339997e+02  2.78e-11 1.10e-13  1.25e-14     7s
FALSE 
FALSE Barrier solved model in 16 iterations and 6.88 seconds
FALSE Optimal objective -7.13339997e+02
FALSE 
FALSE Crossover log...
FALSE 
FALSE   161975 DPushes remaining with DInf 0.0000000e+00                 7s
FALSE        0 DPushes remaining with DInf 6.2022124e-12                 9s
FALSE 
FALSE        1 PPushes remaining with PInf 0.0000000e+00                 9s
FALSE        0 PPushes remaining with PInf 0.0000000e+00                 9s
FALSE 
FALSE   Push phase complete: Pinf 0.0000000e+00, Dinf 6.2022124e-12      9s
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE   161979   -7.1334000e+02   0.000000e+00   0.000000e+00     10s
FALSE 
FALSE Solved with barrier
FALSE Solved in 161979 iterations and 9.79 seconds
FALSE Optimal objective -7.133399975e+02

Here, maternities are distributed as follows

For the min-cost flow analysis, use the following code

FALSE [1] 2.388758e-10

The in-coming and out-coming flows are here equal so we can use gurobi()

FALSE Optimize a model with 209105 rows, 605318 columns and 1210636 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [1e-02, 4e+01]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [5e-02, 1e+03]
FALSE 
FALSE Concurrent LP optimizer: dual simplex and barrier
FALSE Showing barrier log only...
FALSE 
FALSE Presolve removed 12131 rows and 23783 columns
FALSE Presolve time: 2.38s
FALSE Presolved: 196974 rows, 581535 columns, 1163064 nonzeros
FALSE 
FALSE Ordering time: 0.00s
FALSE 
FALSE Barrier statistics:
FALSE  AA' NZ     : 2.890e+05
FALSE  Factor NZ  : 3.003e+06 (roughly 340 MBytes of memory)
FALSE  Factor Ops : 9.776e+07 (less than 1 second per iteration)
FALSE  Threads    : 1
FALSE 
FALSE                   Objective                Residual
FALSE Iter       Primal          Dual         Primal    Dual     Compl     Time
FALSE    0  -1.47600095e+09 -1.18765362e+05  8.80e+03 1.77e+00  2.60e+04     4s
FALSE    1  -1.06983833e+09 -3.10921683e+06  1.52e+03 3.34e+00  6.88e+03     4s
FALSE    2  -4.72815511e+08 -5.82705536e+06  1.15e+02 1.49e-01  9.65e+02     4s
FALSE    3  -1.04226535e+08 -1.20454452e+07  1.31e+01 1.28e-02  1.66e+02     5s
FALSE    4  -6.10772196e+07 -1.69639690e+07  5.62e+00 5.50e-03  7.78e+01     5s
FALSE    5  -4.50557406e+07 -2.08870328e+07  2.98e+00 5.16e-03  4.22e+01     5s
FALSE    6  -3.85653932e+07 -2.22695869e+07  1.91e+00 2.92e-03  2.84e+01     6s
FALSE    7  -3.73820196e+07 -2.31777180e+07  1.72e+00 2.58e-03  2.47e+01     6s
FALSE    8  -3.25649020e+07 -2.38170487e+07  9.47e-01 1.92e-03  1.52e+01     6s
FALSE    9  -2.93672014e+07 -2.49830366e+07  4.48e-01 9.39e-04  7.60e+00     7s
FALSE   10  -2.77523198e+07 -2.54155134e+07  1.99e-01 6.13e-04  4.05e+00     7s
FALSE   11  -2.72155369e+07 -2.58793383e+07  1.18e-01 2.86e-04  2.31e+00     7s
FALSE   12  -2.70252887e+07 -2.60649004e+07  8.97e-02 1.65e-04  1.66e+00     7s
FALSE   13  -2.66488543e+07 -2.61851594e+07  3.55e-02 8.88e-05  8.03e-01     8s
FALSE   14  -2.65325627e+07 -2.62911117e+07  1.74e-02 4.14e-05  4.18e-01     8s
FALSE   15  -2.64802350e+07 -2.63627024e+07  9.40e-03 1.13e-05  2.03e-01     8s
FALSE   16  -2.64480568e+07 -2.63829883e+07  5.07e-03 4.82e-06  1.12e-01     9s
FALSE   17  -2.64299368e+07 -2.63951109e+07  2.65e-03 1.73e-06  6.02e-02     9s
FALSE   18  -2.64170805e+07 -2.64031819e+07  9.52e-04 6.86e-10  2.40e-02     9s
FALSE   19  -2.64112650e+07 -2.64074925e+07  2.23e-04 1.72e-10  6.52e-03     9s
FALSE   20  -2.64097741e+07 -2.64086210e+07  5.88e-05 5.27e-11  1.99e-03    10s
FALSE   21  -2.64093544e+07 -2.64089704e+07  1.62e-05 1.83e-11  6.64e-04    10s
FALSE   22  -2.64092458e+07 -2.64091285e+07  5.75e-06 3.69e-12  2.02e-04    10s
FALSE   23  -2.64091952e+07 -2.64091670e+07  1.15e-06 8.53e-13  4.88e-05    10s
FALSE   24  -2.64091831e+07 -2.64091807e+07  1.10e-06 2.87e-10  4.21e-06    11s
FALSE   25  -2.64091817e+07 -2.64091814e+07  9.73e-07 5.49e-10  4.07e-07    11s
FALSE   26  -2.64091816e+07 -2.64091816e+07  3.00e-07 1.20e-10  1.02e-07    12s
FALSE   27  -2.64091816e+07 -2.64091816e+07  1.25e-07 3.02e-11  3.26e-08    12s
FALSE   28  -2.64091816e+07 -2.64091816e+07  4.09e-08 7.53e-12  9.87e-09    12s
FALSE 
FALSE Barrier solved model in 28 iterations and 12.45 seconds
FALSE Optimal objective -2.64091816e+07
FALSE 
FALSE Crossover log...
FALSE 
FALSE     2406 DPushes remaining with DInf 0.0000000e+00                13s
FALSE        0 DPushes remaining with DInf 9.3240346e-12                13s
FALSE 
FALSE     1477 PPushes remaining with PInf 0.0000000e+00                13s
FALSE        0 PPushes remaining with PInf 0.0000000e+00                14s
FALSE 
FALSE   Push phase complete: Pinf 0.0000000e+00, Dinf 9.2009386e-12     14s
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE     3886   -2.6409182e+07   0.000000e+00   0.000000e+00     14s
FALSE 
FALSE Solved with barrier
FALSE Solved in 3886 iterations and 14.43 seconds
FALSE Optimal objective -2.640918159e+07

We can actually zoom in, e.g. on South-East part of France

or on Paris

On the French map, we can plot the most expensive maternity, and the less expensive one, nodes[idx[1],"INSEE_COMM"] (Florensac, H??rault) nodes[idx[2],"INSEE_COMM"] (Chaumont-sur-Tharonne, Loir-et-Cher)

FALSE        INSEE_COMM ID_RTE500  POINT_X POINT_Y POPULATION SUPERFICIE NAISD14
FALSE 172581      75115     14789 649636.5 6860860      238.2        848      NA
FALSE 175058      76351    125832 491562.7 6936314      173.1       4695    2392
FALSE               FI ACCUN ACCMU  dep COMINSEE   coordx  coordy birth index
FALSE 172581 750100208  2978   228   75    75056 649690.2 6860894  3206  TRUE
FALSE 175058      <NA>    NA    NA <NA>     <NA>       NA      NA    NA    NA
FALSE        NBNODES.Freq NAISSANCE
FALSE 172581          103   0.00000
FALSE 175058          131  19.92524

3.3 Overall Perspective

3.3.1 Chloropleth Map

3.3.2 Smoothing

Let us define a grid on France

Show code

where we keep only points within the French territory.

Let us use k-nearest neighbors, on the k closest nodes (here k=20)

Again, we can zoom in, for instance on the south-east of France, with maternities in black dots

or on Paris

4 Generic R Function(s)

4.1 Dataset Preparation

We need four datasets here as inputs

  • the nodes dataset nodes, with locations (latitude, longitude)
  • the edge dataset arcs, with (directed) connexions between nodes
  • the supply dataset supply, with supply locations (latitude, longitude)
  • the demand dataset demand, with demand locations (latitude, longitude)

The tidy(arcs,nodes,supply,demand) function that reshape datasets * to make sure that supply and demand location are located on one node * to make sure that the graph is connected * to balance supply and demand (or skew symmetry and conservation) : the total flow entering must equal the total flow leaving

It will return four clean datasets arcs, nodes, supply and demand.

Show code for the tidy() function Show code for the pricing() function

4.2 Application to Paris Region

Consider the sub-graph of nodes and edges, located in that rectangle,

There are 333 ‘cities’ remaining.

Just to check, just look at

Show code Show code

If we focus only on maternity (there are 24 maternities in this region), we obtain the following map,

4.3 Application to South-East France

Consider the sub-graph of nodes and edges, located in that rectangle,

There are 8216 ‘cities’ remaining.

Just to check, just look at

Show code Show code

If we focus only on maternity (there are 163 maternities in this region - or 156 depending on the dataset we use…), we obtain the following map,

4.4 Border Issue

4.4.1 In/Out Difference

We did mention it previously, but it is necessary here that the sum of deliveries in (pregnant women, birth registered in France) equals the sum of deliveries out (birth in hospitals), \[ \sum_{\text{node } i:\text{ pregnant woman}}\overline{d}_i = \sum_{\text{node } j:\text{ maternity}}d_j \] Unfortunately, because we use two different sources, both not not equal. So it is necessay to use a correction \[ \tilde{d}_k=\frac{\displaystyle{\sum_{\text{node } j:\text{ maternity}}d_j}}{\displaystyle{\sum_{\text{node } i:\text{ pregnant woman}}\overline{d}_i}}\cdot\overline{d}_k \] Unfortunately, this approximation is uniform over all France. To overcome that issue, local pricing was considered.

4.4.2 Local Approaches

Using the generic function mentioned previously, it is possible to compute prices on any territory. The strategy was to dived France in small territories, and to compute prices on each of them (the area below). In order to avoid border problems when we reconcile, we did actually use slighly larger regions.